Goto

Collaborating Authors

 gradient descent and momentum


On discretisation drift and smoothness regularisation in neural network training

arXiv.org Machine Learning

The deep learning recipe of casting real-world problems as mathematical optimisation and tackling the optimisation by training deep neural networks using gradient-based optimisation has undoubtedly proven to be a fruitful one. The understanding behind why deep learning works, however, has lagged behind its practical significance. We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation. We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms. Understanding the dynamics of GD has been hindered by the presence of discretisation drift, the numerical integration error between GD and its often studied continuous-time counterpart, the negative gradient flow (NGF). To add to the toolkit available to study GD, we derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games. We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regularisers that do not require additional hyperparameters. Like optimisation, smoothness regularisation is another pillar of deep learning's success with wide use in supervised learning and generative modelling. Despite their individual significance, the interactions between smoothness regularisation and optimisation have yet to be explored. We find that smoothness regularisation affects optimisation across multiple deep learning domains, and that incorporating smoothness regularisation in reinforcement learning leads to a performance boost that can be recovered using adaptions to optimisation methods.


Gentle Introduction to Gradient Descent and Momentum

#artificialintelligence

In this article, we will talk about a fundamental concept in machine learning called the Gradient Descent. The gradient descent is one of the most popular algorithms that tends to reduce the error in prediction i.e minimizing your cost function. This might have been confusing but that's okay, before we jump into more details I'll give a very small gist of where it is mostly used. In deep learning, we have a concept called backpropagation. Wikipedia says " backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input–output example, and does so efficiently, unlike a naïve direct computation of the gradient with respect to each weight individually" I had a brain-freeze when I read this, so let me give you an intuitive example to help you understand better.